Search CORE

160 research outputs found

An Evaluation of Classification and Outlier Detection Algorithms

Author: Austin Jim
Hodge Victoria J.
Publication venue
Publication date: 02/05/2018
Field of study

This paper evaluates algorithms for classification and outlier detection accuracies in temporal data. We focus on algorithms that train and classify rapidly and can be used for systems that need to incorporate new data regularly. Hence, we compare the accuracy of six fast algorithms using a range of well-known time-series datasets. The analyses demonstrate that the choice of algorithm is task and data specific but that we can derive heuristics for choosing. Gradient Boosting Machines are generally best for classification but there is no single winner for outlier detection though Gradient Boosting Machines (again) and Random Forest are better. Hence, we recommend running evaluations of a number of algorithms using our heuristics

arXiv.org e-Print Archive

White Rose Research Online

A Binary Neural Shape Matcher using Johnson Counters and Chain Codes

Author: Austin Jim
Hodge Victoria
O'Keefe Simon
Publication venue
Publication date: 10/03/2006
Field of study

In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative-memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative-memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes

White Rose Research Online

An Evaluation of Phonetic Spell Checkers

Author: Austin Jim
Hodge Victoria Jane
Publication venue
Publication date: 01/09/2001
Field of study

In the work reported here, we describe a phonetic spell-checking algorithm integrating aspects of Soundex and Phonix. We increase the number of letter codes compared to Soundex and Phonix. We also integrate phonetic rules but use far less than Phonix where retrieval may be slow due to the computational cost of comparing the input to a large list of transformation rules. Our algorithm aims to repair spelling errors where the user has substituted homophones in place of the correct spelling. We evaluate our algorithm by comparing it to three alternative spell-checking algorithms and three benchmark spell checkers (MS Word 97 & 2000 and UNIX `ispell') using a list of phonetic spelling errors. We find that our approach has superior recall (percentage of correct matches retrieved) to the alternative approaches although the higher recall is at the expense of precision (number of possible matches retrieved). We intend our phonetic spell checker to be integrated into an existing spell checker so the precision will be improved by integration thus high recall is the aim for our approach in this paper

White Rose Research Online

Win Prediction in Esports: Mixed-Rank Match Prediction in Multi-player Online Battle Arena Games

Author: Block Florian
Cowling Peter
Devlin Sam
Drachen Anders
Hodge Victoria
Sephton Nick
Publication venue
Publication date: 17/11/2017
Field of study

Esports has emerged as a popular genre for players as well as spectators, supporting a global entertainment industry. Esports analytics has evolved to address the requirement for data-driven feedback, and is focused on cyber-athlete evaluation, strategy and prediction. Towards the latter, previous work has used match data from a variety of player ranks from hobbyist to professional players. However, professional players have been shown to behave differently than lower ranked players. Given the comparatively limited supply of professional data, a key question is thus whether mixed-rank match datasets can be used to create data-driven models which predict winners in professional matches and provide a simple in-game statistic for viewers and broadcasters. Here we show that, although there is a slightly reduced accuracy, mixed-rank datasets can be used to predict the outcome of professional matches, with suitably optimized configurations

arXiv.org e-Print Archive

White Rose Research Online

A comparison of a novel neural spell checker and standard spell checking algorithms

Author: Cherkassky
Hodge
Jim Austin
Kukich
Turner
Ullman
Victoria J. Hodge
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/11/2002
Field of study

In this paper, we propose a simple and flexible spell checker using efficient associative matching in the AURA modular neural system. Our approach aims to provide a pre-processor for an information retrieval (IR) system allowing the user's query to be checked against a lexicon and any spelling errors corrected, to prevent wasted searching. IR searching is computationally intensive so much so that if we can prevent futile searches we can minimise computational cost. We evaluate our approach against several commonly used spell checking techniques for memory-use, retrieval speed and recall accuracy. The proposed methodology has low memory use, high speed for word presence checking, reasonable speed for spell checking and a high recall rate

Crossref

White Rose Research Online

A Binary Neural Network Framework for Attribute Selection and Prediction

Author: Austin Jim
Hodge Victoria Jane
Jackson Tom
Publication venue: 'Scitepress'
Publication date: 05/10/2012
Field of study

In this paper, we introduce an implementation of the attribute selection algorithm, Correlation-based Feature Selection (CFS) integrated with our k-nearest neighbour (k-NN) framework. Binary neural networks underpin our k-NN and allow us to create a unified framework for attribute selection, prediction and classification. We apply the framework to a real world application of predicting bus journey times from traffic sensor data and show how attribute selection can both speed our k-NN and increase the prediction accuracy by removing noise and redundant attributes from the data

White Rose Research Online

The Stellar Populations of NGC 3109: Another Dwarf Irregular Galaxy with a Population II Stellar Halo

Author: Albert A. Zijlstra
Bertelli G.
Chiosi C.
Dante Minniti
Hodge P.
M. Victoria Alonso
Zijlstra A. A.
Publication venue: 'University of Chicago Press'
Publication date: 29/10/1998
Field of study

We have obtained V and I-band photometry for about 17500 stars in the field of the dwarf irregular galaxy NGC3109, located in the outskirts of the Local Group. The photometry allows us to study the stellar populations present inside and outside the disk of this galaxy. From the VI color-magnitude diagram we infer metallicities and ages for the stellar populations in the main body and in the halo of NGC3109. The stars in the disk of this galaxy have a wide variety of ages, including very young stars with approximately 10^7 yr. Our main result is to establish the presence of a halo consisting of population II stars, extending out to about 4.5 arcmin (or 1.8 kpc) above and below the plane of this galaxy. For these old stars we derive an age of > 10 Gyr and a metallicity of [Fe/H] = -1.8 +/- 0.2. We construct a deep luminosity function, obtaining an accurate distance modulus (m-M)_0 = 25.62 +/- 0.1 for this galaxy based on the I-magnitude of the red giant branch (RGB) tip and adopting E(V-I) = 0.05.Comment: Accepted for publication in the Astronomical Journal 23 pages, latex, 12 Figures (Fig 1 not available in electronic format

arXiv.org e-Print Archive

Crossref

CERN Document Server

Integrating Information Retrieval & Neural Networks

Author: Hodge Victoria Jane
Publication venue: Department of Computer Science, University of York
Publication date: 01/12/2001
Field of study

Due to the proliferation of information in databases and on the Internet, users are overwhelmed leading to Information Overload. It is impossible for humans to index and search such a wealth of information by hand so automated indexing and searching techniques are required. In this dissertation, we explore current Information Retrieval (IR) techniques and their shortcomings and we consider how more sophisticated approaches can be developed to aid retrieval. Current techniques can be slow due to the sheer volume of the search space although faster ones are being developed. Matching is often poor, as the quantity of retrievals does not necessarily indicate quality retrievals. Many current approaches simply return the documents containing the greatest number of `query words'. A methodology is desired to: process documents unsupervised; generate an index using a data structure that is memory efficient, speedy, incremental and scalable; identify spelling mistakes in the query and suggest alternative spellings; handle paraphrasing of documents and synonyms for both indexing and searching; to focus retrieval by minimising the search space; and, finally calculate the query-document similarity from statistics autonomously derived from the text corpus. We describe our IR system named MinerTaur, developed using both the AURA modular neural system and a hierarchical, growing self-organising neural technique based on Growing Cell Structures which we call TreeGCS. We integrate three modules in MinerTaur: a spell checker; a hierarchical thesaurus generated from corpus statistics inferred by the system; and, a word-document matrix to efficiently store the associations between the documents and their constituent words. We describe each module individually and evaluate each against comparative data structures and benchmark implementations. We identify improved memory usage, spelling recall accuracy, cluster quality and training and recall times for the modules. Finally we compare MinerTaur against a benchmark IR system, SMART developed at Cornell University, and reveal superior recall and precision for MinerTaur versus SMART

White Rose Research Online

Wireless Sensor Networks for Condition Monitoring in the Railway Industry : a Survey

Author: Hodge Victoria Jane
Moulds Anthony
O'Keefe Simon
Weeks Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2015
Field of study

In recent years, the range of sensing technologies has expanded rapidly, whereas sensor devices have become cheaper. This has led to a rapid expansion in condition monitoring of systems, structures, vehicles, and machinery using sensors. Key factors are the recent advances in networking technologies such as wireless communication and mobile adhoc networking coupled with the technology to integrate devices. Wireless sensor networks (WSNs) can be used for monitoring the railway infrastructure such as bridges, rail tracks, track beds, and track equipment along with vehicle health monitoring such as chassis, bogies, wheels, and wagons. Condition monitoring reduces human inspection requirements through automated monitoring, reduces maintenance through detecting faults before they escalate, and improves safety and reliability. This is vital for the development, upgrading, and expansion of railway networks. This paper surveys these wireless sensors network technology for monitoring in the railway industry for analyzing systems, structures, vehicles, and machinery. This paper focuses on practical engineering solutions, principally,which sensor devices are used and what they are used for; and the identification of sensor configurations and network topologies. It identifies their respective motivations and distinguishes their advantages and disadvantages in a comparative review

Crossref

White Rose Research Online

Discretisation of Data in a Binary Neural k-Nearest Neighbour Algorithm

Author: Austin Jim
Hodge Victoria Jane
Publication venue
Publication date: 01/06/2012
Field of study

This paper evaluates several methods of discretisation (binning) within a k-Nearest Neighbour predictor. Our k-NN is constructed using binary neural networks which require continuous-valued data to be discretised to allow it to be mapped to the binary neural framework. Our approach uses discretisation coupled with robust encoding to map data sets onto the binary neural network. In this paper, we compare seven unsupervised discretisation methods for retrieval accuracy (prediction accuracy) across a range of well-known prediction data sets comprising time-series data. We analyse whether there is an optimal discretisation configuration for our k-NN. The analyses demonstrate that the configuration is data specific. Hence, we recommend running evaluations of a number of configurations, varying both the discretisation methods and the number of discretisation bins, using a test data set. This evaluation will pinpoint the optimum configuration for new data sets

White Rose Research Online